Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model
نویسندگان
چکیده
MOTIVATION Mislabeled samples often appear in gene expression profile because of the similarity of different sub-type of disease and the subjective misdiagnosis. The mislabeled samples deteriorate supervised learning procedures. The LOOE-sensitivity algorithm is an approach for mislabeled sample detection for microarray based on data perturbation. However, the failure of measuring the perturbing effect makes the LOOE-sensitivity algorithm a poor performance. The purpose of this article is to design a novel detection method for mislabeled samples of microarray, which could take advantage of the measuring effect of data perturbations. RESULTS To measure the effect of data perturbation, we define an index named perturbing influence value (PIV), based on the support vector machine (SVM) regression model. The Column Algorithm (CAPIV), Row Algorithm (RAPIV) and progressive Row Algorithm (PRAPIV) based on the PIV value are proposed to detect the mislabeled samples. Experimental results obtained by using six artificial datasets and five microarray datasets demonstrate that all proposed methods in this article are superior to LOOE-sensitivity. Moreover, compared with the simple SVM and CL-stability, the PRAPIV algorithm shows an increase in precision and high recall. AVAILABILITY The program and source code (in JAVA) are publicly available at http://ccst.jlu.edu.cn/CSBG/PIVS/index.htm
منابع مشابه
Comparison of time to the event and nonlinear regression models in the analysis of germination data
Extended abstract Introduction: Numerous studies are being carried out to reveal the effects of different treatments on the germination of seeds from various plants. The most commonly used method of analysis is the nonlinear regression which estimates germination parameters. Although the nonlinear regression has been performed based on different models, some serious problems in its structure...
متن کاملRegression Modeling for Spherical Data via Non-parametric and Least Square Methods
Introduction Statistical analysis of the data on the Earth's surface was a favorite subject among many researchers. Such data can be related to animal's migration from a region to another position. Then, statistical modeling of their paths helps biological researchers to predict their movements and estimate the areas that are most likely to constitute the presence of the animals. From a geome...
متن کاملPrediction of chronological age based on Demirjian dental age using robust ridge regression method
Introduction: Estimation of age has an important role in legal medicine, endocrine diseases and clinical dentistry. Correspondingly, evaluation of dental development stages is more valuable than tooth erosion. In this research, the modeling of calendar age has been done using new and rich statistical methods. Considerably, it can be considering as a practicable method in medical science that is...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملتشخیص هوشمند و سریع بیماری قلبی بر اساس همافزایی شبکههای عصبی خطی و روش رگرسیون منطقی
Background and purpose: Diseases have been the greatest threat for human being along the history. Heart disease (HD) has gained special attention in medical studies. Recently studying on classification and diagnosis of HD as a key topic and a lot of researches have been done in order to increase precise and reduce error in this type of decisions. With development of intelligent learning syst...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 25 20 شماره
صفحات -
تاریخ انتشار 2009